3. Structure standardization

(c) 2019, Dr. Ramil Nugmanov; Dr. Timur Madzhidov; Ravil Mukhametgaleev

Installation instructions of CGRtools package information and tutorial's files see on https://github.com/cimm-kzn/CGRtools

NOTE: Tutorial should be performed sequentially from the start. Random cell running will lead to unexpected results.


In [ ]:
import pkg_resources
if pkg_resources.get_distribution('CGRtools').version.split('.')[:2] != ['3', '1']:
    print('WARNING. Tutorial was tested on 3.1 version of CGRtools')
else:
    print('Welcome!')

In [ ]:
# load data for tutorial
from pickle import load
from traceback import format_exc

with open('molecules.dat', 'rb') as f:
    molecules = load(f) # list of MoleculeContainer objects
with open('reactions.dat', 'rb') as f:
    reactions = load(f) # list of ReactionContainer objects

m1, m2, m3, m4 = molecules # molecule
m12 = m3.copy()
r1 = reactions[0] # reaction

m1.reset_query_marks()
m1.atom(1).isotope = 16
m1.flush_cache()

m1.delete_atom(3)
m1.atom_implicit_h(1)
m1.atom_explicit_h(1)
m1.atom_total_h(1)

3.1. Molecules

MoleculeContainer has standardize and aromatize methods.

Method aromatize transforms Kekule representation of rings into aromatized

Method standardize applies functional group standardization rules to molecules. The following rules are implemented (corresponding SMARTS are given):

• Aromatic N-Oxide  [#7;a:1]=[O:2]>>[#7+:1]-[#8-:2]
• Azide             [#7;A;X2-:1][N;X2+:2]#[N;X1:3]>>[#7:1]=[N+:2]=[#7-:3]
• Diazo             [#6;X3-:1][N;X2+:2]#[N;X1:3]>>[#6;A:1]=[N+:2]=[#7-:3]
• Diazonium         [#6]-[#7:1]=[#7+:2]>>[#6][N+:1]#[N:2]
• Iminium       [#6;X3+:1]-[#7;X3:2]>>[#6;A:1]=[#7+:2]
• Isocyanate        [#7+:1][#6;A-:2]=[O:3]>>[#7:1]=[C:2]=[O:3]
• Nitrilium         [#6;A;X2+:1]=[#7;X2:2]>>[C:1]#[N+:2]
• Nitro             [O:3]=[N:1]=[O:2]>>[#8-:2]-[#7+:1]=[O:3]
• Nitrone Nitronate     [#6;A]=[N:1]=[O:2]>>[#8-:2]-[#7+:1]=[#6;A]
• Nitroso       [#6]-[#7H2+:1]-[#8;X1-:2]>>[#6]-[#7:1]=[O:2]
• Phosphonic        [#6][P+:1]([#8;X2])([#8;X2])[#8-:2]>>[#6][P:1]([#8])([#8])=[O:2]
• Phosphonium Ylide     [#6][P-:1]([#6])([#6])[#6+:2]>>[#6][P:1]([#6])([#6])=[#6;A:2]
• Selenite          [#8;X2][Se+:1]([#8;X2])[#8-:2]>>[#8][Se:1]([#8])=[O:2]
• Silicate          [#8;X2]-[#14+:1](-[#8;X2])-[#8-:2]>>[#8]-[#14:1](-[#8])=[O:2]
• Sulfine       [#6]-[#6](-[#6])=[S+:1][#8-:2]>>[#6]-[#6](-[#6])=[S:1]=[O:2]
• Sulfon            [#6][S;X3+:1]([#6])[#8-:2]>>[#6][S:1]([#6])=[O:2]
• Sulfonium Ylide   [#6][S-:1]([#6])[#6+:2]>>[#6][S:1]([#6])=[#6;A:2]
• Sulfoxide         [#6][S+:1]([#6])([#8-:2])=O>>[#6][S:1]([#6])(=[O:2])=O
• Sulfoxonium Ylide     [#6][S+:1]([#6])([#8-:2])=[#6;A]>>[#6][S:1]([#6])(=[#6;A])=[O:2]
• Tertiary N-Oxide      [#6]-[#7;X4:1]=[O:2]>>[#6]-[#7+:1]-[#8-:2]

In [ ]:
m12 # molecule with kekulized ring

In [ ]:
m12.aromatize() # aromatizes and returns number of transformed rings

In [ ]:
m12 # cleaned structure. Cache is flushed automatically

In [ ]:
m12.standardize()  # apply standardization. Returns number of transformed groups

In [ ]:
m12

Molecules has explicify_hydrogens and implicify_hydrogens methods to handle hydrogens.

This methods is used to add or remove hydrogens in molecule.

Note that currently for pyrole-like molecules implicit hydrogens atoms are calculated incorrectly


In [ ]:
m1.explicify_hydrogens() # return number of added hydrogens

In [ ]:
m1 # for added hydrogen atoms coordinates are not calculated. Thus, it looks like hydrogen has the same position on image

In [ ]:
m1.implicify_hydrogens()

In [ ]:
m1

CGRtools has experimental algorithm for 2d geometry calcultaion. It works fine only for small molecules. Algorithm requires numpy and scipy packages


In [ ]:
m1.explicify_hydrogens() # add explicit hydrogens
m1.calculate2d() # experimental force field-based 2d geometry calculation.
m1

3.2. Reactions standardization

ReactionContainer has standardize, aromatize, explicify_hydrogens and implicify_hydrogens methods that can be applied to reactions. In this case they are applied to all molecules in reaction.


In [ ]:
reactions[2]

In [ ]:
reactions[2].standardize()
reactions[2].explicify_hydrogens()
reactions[2]